Geometric Dirichlet Means Algorithm for topic inference
نویسندگان
چکیده
We propose a geometric algorithm for topic learning and inference that is built on the convex geometry of topics arising from the Latent Dirichlet Allocation (LDA) model and its nonparametric extensions. To this end we study the optimization of a geometric loss function, which is a surrogate to the LDA’s likelihood. Our method involves a fast optimization based weighted clustering procedure augmented with geometric corrections, which overcomes the computational and statistical inefficiencies encountered by other techniques based on Gibbs sampling and variational inference, while achieving the accuracy comparable to that of a Gibbs sampler. The topic estimates produced by our method are shown to be statistically consistent under some conditions. The algorithm is evaluated with extensive experiments on simulated and real data.
منابع مشابه
Approximate Mean Field for Dirichlet-Based Models
Variational inference is an important class of approximate inference techniques that has been applied to many graphical models, including topic models. We propose to improve the efficiency of mean field inference for Dirichlet-based models by introducing an approximative framework that converts weighted geometric means in the updates into weighted arithmetic means. This paper also discusses a c...
متن کاملA Theoretical and Practical Implementation Tutorial on Topic Modeling and Gibbs Sampling
This technical report provides a tutorial on the theoretical details of probabilistic topic modeling and gives practical steps on implementing topic models such as Latent Dirichlet Allocation (LDA) through the Markov Chain Monte Carlo approximate inference algorithm Gibbs Sampling.
متن کاملNeural Variational Inference For Topic Models
Topic models are one of the most popular methods for learning representations of text, but a major challenge is that any change to the topic model requires mathematically deriving a new inference algorithm. A promising approach to address this problem is neural variational inference (NVI), but they have proven difficult to apply to topic models in practice. We present what is to our knowledge t...
متن کاملLanguage model adaptation using latent dirichlet allocation and an efficient topic inference algorithm
We present an effort to perform topic mixture-based language model adaptation using latent Dirichlet allocation (LDA). We use probabilistic latent semantic analysis (PLSA) to automatically cluster a heterogeneous training corpus, and train an LDAmodel using the resultant topicdocument assignments. Using this LDA model, we then construct topic-specific corpora at the utterance level for interpol...
متن کاملAutoencoding Variational Inference for Topic Models
Topic models are one of the most popular methods for learning representations of text, but a major challenge is that any change to the topic model requires mathematically deriving a new inference algorithm. A promising approach to address this problem is autoencoding variational Bayes (AEVB), but it has proven difficult to apply to topic models in practice. We present what is to our knowledge t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016